12        Bioinformatics

a Phred quality score to measure the accuracy of each base called. The Phred quality score

(Q-score) transforms the probability of calling a base wrongly into an integer score that is

easy to interpret. The Phred score is defined as

p

Q

=

10

/10

(1.3)

Q

p( )

= −10log10

(1.4)

where p is the probability of the base call being wrong as estimated by the caller software.

The Phred quality scores are encoded using ASCII single characters. All ASCII char-

acters have a decimal number associated with them. However, since the first 32 ASCII

characters are non-printable and the integer 33, which is the decimal number for the

exclamation mark ASCII character “!”, the Q=0 is the exclamation mark and the encod-

ing that begins with “!” as zero is called Phred+33 encoding. Illumina 1.8 and later ver-

sions use this Phred+33 encoding (Q33) to encode the base call quality in FASTQ files. The

older Illumina versions (e.g., Solexa) used Phred+64 encoding, in which the character “@”,

whose decimal number is 64, corresponds to Q=0. Table 1.1 shows the Phred quality score

(Q), corresponding probability (P), and the decimal number and ASCII code. For instance,

when the probability of calling a base is 0.1, the Phred score will be 10 (Q=10), but instead

of giving the number 10, that quality score is encoded as the plus sign “+”.

Higher Q scores indicate a smaller probability of error and lower Q scores indicate

low qualities of the base called which is more likely that the base was called wrongly. For

instance, a quality score of 20 indicates the chance of making an error rate (1 error) in 100,

corresponding to 99% call accuracy. In general, the Q-score of 30 is considered a benchmark

TABLE 1.1  Phred Quality Score and ASCII_BASE 33 (Q33)

Q

P

ASCII

Q

p

ASCII

Q

p

ASCII

0

1.00000

33

!

15

0.03162

48

0

30

0.00100

63

?

1

0.79433

34

16

0.02512

49

1

31

0.00079

64

@

2

0.63096

35

#

17

0.01995

50

2

32

0.00063

65

A

3

0.50119

36

$

18

0.01585

51

3

33

0.00050

66

B

4

0.39811

37

%

19

0.01259

52

4

34

0.00040

67

C

5

0.31623

38

&

20

0.01000

53

5

35

0.00032

68

D

6

0.25119

39

21

0.00794

54

6

36

0.00025

69

E

7

0.19953

40

(

22

0.00631

55

7

37

0.00020

70

F

8

0.15849

41

)

23

0.00501

56

8

38

0.00016

71

G

9

0.12589

42

*

24

0.00398

57

9

39

0.00013

72

H

10

0.10000

43

+

25

0.00316

58

:

40

0.00010

73

I

11

0.07943

44

,

26

0.00251

59

;

41

0.00008

74

J

12

0.06310

45

-

27

0.00200

60

42

0.00006

75

K

13

0.05012

46

.

28

0.00158

61

=

43

0.00005

76

L

14

0.03981

47

/

29

0.00126

62

44

0.00004

77

M